Visualization of Large Complex Data

Steve Elston

12/26/2020

Visualizing Large Complex Data is Difficult

Problem: Modern data sets are growing in size and complexity

Limitation of Scientific Graphics

All scientific graphics are limited to a 2-dimensional projection

Scalable Chart Types

Some chart types are inherently scalable.

Over-plotting

Over-plotting occurs in plots when the markers lie one on another.

Dealing with Over-plotting

What can we do about over-plotting?

Example of Overplotting

<<<<<<< HEAD

Use Transparency, Marker Size, Downsampling

=======

Use Transparency, Marker Size, Downsampling

>>>>>>> e138d62bd4a809b5a942d13cb0cde77376add9d3

Other Methods to Display Large Data Sets

Alternatives to avoid over-plotting for truly large data sets

Hexbin Plot

Countour Plot

<<<<<<< HEAD

=======

>>>>>>> e138d62bd4a809b5a942d13cb0cde77376add9d3

Other Methods to Display Large Data Sets

Sometimes a creative alternative is best

Time Series of Box Plots

Displays for Complex Data

How can we understand the relationships in complex data with many variables?

Arrays of Plots

Display multiple plot views in an array or grid

Scatter Plot Matrix

Scatter plot matrix used to investigate relationships between a number of variables

Scatter Plot Matrix

<<<<<<< HEAD

=======

>>>>>>> e138d62bd4a809b5a942d13cb0cde77376add9d3

Facet Plots

Facet plots revolutionized statistical graphics starting about 30 years ago

Facet Plots

Like many good ideas facet plotting was invented serveral times

Facet Plot with Weather by Season

Congnostics

How can we visualize very high dimensional data?

Cognistic: Counties With Fastest Rate of Housing Price Increase

## C:\Users\asano\ANACON~1\lib\site-packages\pandas\core\indexing.py:1745: SettingWithCopyWarning: 
## A value is trying to be set on a copy of a slice from a DataFrame.
## Try using .loc[row_indexer,col_indexer] = value instead
## 
## See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
##   isetter(ilocs[0], value)
## 0    DC
## 1    WY
## 2    ND
## 3    MT
## 4    OK
## 5    VT
## 6    MI
## 7    MN
## Name: entity_name, dtype: object
<<<<<<< HEAD

=======

>>>>>>> e138d62bd4a809b5a942d13cb0cde77376add9d3

Summary

We have explored these key points